Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 10865 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.2 MiB |
| Average record size in memory | 112.0 B |
Variable types
| Numeric | 10 |
|---|---|
| Categorical | 4 |
original_title has a high cardinality: 10571 distinct values | High cardinality |
director has a high cardinality: 5068 distinct values | High cardinality |
genres has a high cardinality: 2040 distinct values | High cardinality |
release_date has a high cardinality: 5909 distinct values | High cardinality |
df_index is highly correlated with release_year | High correlation |
popularity is highly correlated with revenue and 1 other fields | High correlation |
budget is highly correlated with revenue and 3 other fields | High correlation |
revenue is highly correlated with popularity and 4 other fields | High correlation |
vote_count is highly correlated with popularity and 4 other fields | High correlation |
release_year is highly correlated with df_index | High correlation |
budget_adj is highly correlated with budget and 3 other fields | High correlation |
revenue_adj is highly correlated with budget and 3 other fields | High correlation |
df_index is uniformly distributed | Uniform |
original_title is uniformly distributed | Uniform |
df_index has unique values | Unique |
Reproduction
| Analysis started | 2022-10-07 23:45:50.344572 |
|---|---|
| Analysis finished | 2022-10-07 23:47:21.560679 |
| Duration | 1 minute and 31.22 seconds |
| Software version | pandas-profiling v3.3.0 |
| Download configuration | config.json |
| Distinct | 10865 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5432.807639 |
| Minimum | 0 |
|---|---|
| Maximum | 10865 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 85.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 543.2 |
| Q1 | 2717 |
| median | 5433 |
| Q3 | 8149 |
| 95-th percentile | 10321.8 |
| Maximum | 10865 |
| Range | 10865 |
| Interquartile range (IQR) | 5432 |
Descriptive statistics
| Standard deviation | 3136.868785 |
|---|---|
| Coefficient of variation (CV) | 0.5773936781 |
| Kurtosis | -1.199908013 |
| Mean | 5432.807639 |
| Median Absolute Deviation (MAD) | 2716 |
| Skewness | -0.0001828882776 |
| Sum | 59027455 |
| Variance | 9839945.777 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2047 | 1 | < 0.1% |
| 3363 | 1 | < 0.1% |
| 7473 | 1 | < 0.1% |
| 5424 | 1 | < 0.1% |
| 9518 | 1 | < 0.1% |
| 3371 | 1 | < 0.1% |
| 1322 | 1 | < 0.1% |
| 7465 | 1 | < 0.1% |
| 5416 | 1 | < 0.1% |
| 9510 | 1 | < 0.1% |
| Other values (10855) | 10855 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 10865 | 1 | |
| 10864 | 1 | |
| 10863 | 1 | |
| 10862 | 1 | |
| 10861 | 1 | |
| 10860 | 1 | |
| 10859 | 1 | |
| 10858 | 1 | |
| 10857 | 1 | |
| 10856 | 1 |
| Distinct | 10571 |
|---|---|
| Distinct (%) | 97.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 85.0 KiB |
| Hamlet | 4 |
|---|---|
| A Christmas Carol | 3 |
| Shelter | 3 |
| Carrie | 3 |
| Hercules | 3 |
| Other values (10566) |
Length
| Max length | 104 |
|---|---|
| Median length | 70 |
| Mean length | 16.00312931 |
| Min length | 1 |
Characters and Unicode
| Total characters | 173874 |
|---|---|
| Distinct characters | 164 |
| Distinct categories | 19 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 6 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 10295 ? |
|---|---|
| Unique (%) | 94.8% |
Sample
| 1st row | Jurassic World |
|---|---|
| 2nd row | Mad Max: Fury Road |
| 3rd row | Insurgent |
| 4th row | Star Wars: The Force Awakens |
| 5th row | Furious 7 |
Common Values
| Value | Count | Frequency (%) |
| Hamlet | 4 | < 0.1% |
| A Christmas Carol | 3 | < 0.1% |
| Shelter | 3 | < 0.1% |
| Carrie | 3 | < 0.1% |
| Hercules | 3 | < 0.1% |
| Frankenstein | 3 | < 0.1% |
| Life | 3 | < 0.1% |
| Jane Eyre | 3 | < 0.1% |
| Julia | 3 | < 0.1% |
| Emma | 3 | < 0.1% |
| Other values (10561) | 10834 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| the | 3279 | 10.6% |
| of | 969 | 3.1% |
| a | 386 | 1.2% |
| in | 327 | 1.1% |
| and | 317 | 1.0% |
| to | 227 | 0.7% |
| 2 | 226 | 0.7% |
| 211 | 0.7% | |
| man | 148 | 0.5% |
| for | 113 | 0.4% |
| Other values (8859) | 24842 |
Most occurring characters
| Value | Count | Frequency (%) |
| 20178 | 11.6% | |
| e | 17617 | 10.1% |
| a | 10758 | 6.2% |
| o | 10239 | 5.9% |
| r | 9356 | 5.4% |
| n | 9343 | 5.4% |
| i | 9062 | 5.2% |
| t | 8414 | 4.8% |
| s | 6857 | 3.9% |
| h | 6463 | 3.7% |
| Other values (154) | 65587 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 122010 | |
| Uppercase Letter | 27178 | 15.6% |
| Space Separator | 20195 | 11.6% |
| Other Punctuation | 2612 | 1.5% |
| Decimal Number | 1140 | 0.7% |
| Dash Punctuation | 212 | 0.1% |
| Modifier Symbol | 114 | 0.1% |
| Other Symbol | 101 | 0.1% |
| Currency Symbol | 78 | < 0.1% |
| Other Number | 71 | < 0.1% |
| Other values (9) | 163 | 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 17617 | |
| a | 10758 | 8.8% |
| o | 10239 | 8.4% |
| r | 9356 | 7.7% |
| n | 9343 | 7.7% |
| i | 9062 | 7.4% |
| t | 8414 | 6.9% |
| s | 6857 | 5.6% |
| h | 6463 | 5.3% |
| l | 5845 | 4.8% |
| Other values (35) | 28056 |
Uppercase Letter
| Value | Count | Frequency (%) |
| T | 3615 | 13.3% |
| S | 2228 | 8.2% |
| B | 1815 | 6.7% |
| M | 1783 | 6.6% |
| C | 1613 | 5.9% |
| D | 1579 | 5.8% |
| A | 1541 | 5.7% |
| L | 1281 | 4.7% |
| H | 1244 | 4.6% |
| P | 1197 | 4.4% |
| Other values (27) | 9282 |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 1097 | |
| ' | 560 | |
| . | 368 | 14.1% |
| , | 154 | 5.9% |
| & | 153 | 5.9% |
| ! | 123 | 4.7% |
| ? | 41 | 1.6% |
| / | 29 | 1.1% |
| ¡ | 14 | 0.5% |
| • | 12 | 0.5% |
| Other values (14) | 61 | 2.3% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 327 | |
| 1 | 172 | |
| 3 | 172 | |
| 0 | 167 | |
| 4 | 81 | 7.1% |
| 5 | 67 | 5.9% |
| 9 | 48 | 4.2% |
| 7 | 42 | 3.7% |
| 6 | 35 | 3.1% |
| 8 | 29 | 2.5% |
Currency Symbol
| Value | Count | Frequency (%) |
| € | 29 | |
| ¤ | 18 | |
| ¢ | 12 | |
| £ | 10 | 12.8% |
| ¥ | 5 | 6.4% |
| $ | 4 | 5.1% |
Other Number
| Value | Count | Frequency (%) |
| ¹ | 22 | |
| ³ | 14 | |
| ¼ | 14 | |
| ½ | 9 | |
| ² | 7 | 9.9% |
| ¾ | 5 | 7.0% |
Modifier Symbol
| Value | Count | Frequency (%) |
| ¸ | 63 | |
| ¨ | 24 | 21.1% |
| ´ | 14 | 12.3% |
| ˜ | 9 | 7.9% |
| ¯ | 4 | 3.5% |
Other Symbol
| Value | Count | Frequency (%) |
| © | 60 | |
| ™ | 20 | 19.8% |
| ° | 11 | 10.9% |
| ¦ | 7 | 6.9% |
| ® | 3 | 3.0% |
Math Symbol
| Value | Count | Frequency (%) |
| ± | 9 | |
| ¬ | 8 | |
| × | 5 | |
| + | 3 | 12.0% |
Final Punctuation
| Value | Count | Frequency (%) |
| » | 8 | |
| ” | 7 | |
| ’ | 3 | 14.3% |
| › | 3 | 14.3% |
Initial Punctuation
| Value | Count | Frequency (%) |
| ‹ | 7 | |
| ‘ | 7 | |
| “ | 5 | |
| « | 5 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 206 | |
| — | 5 | 2.4% |
| – | 1 | 0.5% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 16 | |
| ‚ | 14 | |
| „ | 7 |
Space Separator
| Value | Count | Frequency (%) |
| 20178 | ||
| 17 | 0.1% |
Other Letter
| Value | Count | Frequency (%) |
| ª | 13 | |
| º | 8 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 16 |
Modifier Letter
| Value | Count | Frequency (%) |
| ˆ | 11 |
Format
| Value | Count | Frequency (%) |
| | 6 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 149204 | |
| Common | 24670 | 14.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 17617 | 11.8% |
| a | 10758 | 7.2% |
| o | 10239 | 6.9% |
| r | 9356 | 6.3% |
| n | 9343 | 6.3% |
| i | 9062 | 6.1% |
| t | 8414 | 5.6% |
| s | 6857 | 4.6% |
| h | 6463 | 4.3% |
| l | 5845 | 3.9% |
| Other values (73) | 55250 |
Common
| Value | Count | Frequency (%) |
| 20178 | ||
| : | 1097 | 4.4% |
| ' | 560 | 2.3% |
| . | 368 | 1.5% |
| 2 | 327 | 1.3% |
| - | 206 | 0.8% |
| 1 | 172 | 0.7% |
| 3 | 172 | 0.7% |
| 0 | 167 | 0.7% |
| , | 154 | 0.6% |
| Other values (71) | 1269 | 5.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 172791 | |
| None | 915 | 0.5% |
| Punctuation | 99 | 0.1% |
| Currency Symbols | 29 | < 0.1% |
| Letterlike Symbols | 20 | < 0.1% |
| Modifier Letters | 20 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 20178 | 11.7% | |
| e | 17617 | 10.2% |
| a | 10758 | 6.2% |
| o | 10239 | 5.9% |
| r | 9356 | 5.4% |
| n | 9343 | 5.4% |
| i | 9062 | 5.2% |
| t | 8414 | 4.9% |
| s | 6857 | 4.0% |
| h | 6463 | 3.7% |
| Other values (73) | 64504 |
None
| Value | Count | Frequency (%) |
| Ã | 162 | 17.7% |
| ¸ | 63 | 6.9% |
| © | 60 | 6.6% |
| à | 50 | 5.5% |
| ì | 31 | 3.4% |
| ¨ | 24 | 2.6% |
| ¹ | 22 | 2.4% |
| å | 21 | 2.3% |
| ¤ | 18 | 2.0% |
| ã | 17 | 1.9% |
| Other values (52) | 447 |
Currency Symbols
| Value | Count | Frequency (%) |
| € | 29 |
Letterlike Symbols
| Value | Count | Frequency (%) |
| ™ | 20 |
Punctuation
| Value | Count | Frequency (%) |
| ‚ | 14 | |
| • | 12 | |
| ‰ | 11 | |
| ‡ | 8 | |
| ‹ | 7 | |
| ‘ | 7 | |
| „ | 7 | |
| ” | 7 | |
| … | 7 | |
| “ | 5 | 5.1% |
| Other values (5) | 14 |
Modifier Letters
| Value | Count | Frequency (%) |
| ˆ | 11 | |
| ˜ | 9 |
| Distinct | 10814 |
|---|---|
| Distinct (%) | 99.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.6464455549 |
| Minimum | 6.5 × 10-5 |
|---|---|
| Maximum | 32.985763 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 85.0 KiB |
Quantile statistics
| Minimum | 6.5 × 10-5 |
|---|---|
| 5-th percentile | 0.0642494 |
| Q1 | 0.207575 |
| median | 0.383831 |
| Q3 | 0.713857 |
| 95-th percentile | 2.0466582 |
| Maximum | 32.985763 |
| Range | 32.985698 |
| Interquartile range (IQR) | 0.506282 |
Descriptive statistics
| Standard deviation | 1.00023085 |
|---|---|
| Coefficient of variation (CV) | 1.547277791 |
| Kurtosis | 210.9783633 |
| Mean | 0.6464455549 |
| Median Absolute Deviation (MAD) | 0.215396 |
| Skewness | 9.875866523 |
| Sum | 7023.630954 |
| Variance | 1.000461754 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.109305 | 2 | < 0.1% |
| 0.114027 | 2 | < 0.1% |
| 0.468552 | 2 | < 0.1% |
| 0.150035 | 2 | < 0.1% |
| 0.187319 | 2 | < 0.1% |
| 0.186995 | 2 | < 0.1% |
| 0.265119 | 2 | < 0.1% |
| 0.011798 | 2 | < 0.1% |
| 0.138861 | 2 | < 0.1% |
| 0.078482 | 2 | < 0.1% |
| Other values (10804) | 10845 |
| Value | Count | Frequency (%) |
| 6.5 × 10-5 | 1 | |
| 0.000188 | 1 | |
| 0.00062 | 1 | |
| 0.000973 | 1 | |
| 0.001115 | 1 | |
| 0.001117 | 1 | |
| 0.001315 | 1 | |
| 0.001317 | 1 | |
| 0.001349 | 1 | |
| 0.001372 | 1 |
| Value | Count | Frequency (%) |
| 32.985763 | 1 | |
| 28.419936 | 1 | |
| 24.949134 | 1 | |
| 14.311205 | 1 | |
| 13.112507 | 1 | |
| 12.971027 | 1 | |
| 12.037933 | 1 | |
| 11.422751 | 1 | |
| 11.173104 | 1 | |
| 10.739009 | 1 |
| Distinct | 557 |
|---|---|
| Distinct (%) | 5.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22291100 |
| Minimum | 1 |
|---|---|
| Maximum | 425000000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 85.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1700000 |
| Q1 | 14624286.06 |
| median | 14624286.06 |
| Q3 | 15000000 |
| 95-th percentile | 75000000 |
| Maximum | 425000000 |
| Range | 424999999 |
| Interquartile range (IQR) | 375713.9357 |
Descriptive statistics
| Standard deviation | 28013845.61 |
|---|---|
| Coefficient of variation (CV) | 1.256727824 |
| Kurtosis | 24.90082452 |
| Mean | 22291100 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.228529025 |
| Sum | 2.421928015 × 1011 |
| Variance | 7.847755458 × 1014 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 14624286.06 | 5696 | |
| 20000000 | 190 | 1.7% |
| 15000000 | 183 | 1.7% |
| 25000000 | 178 | 1.6% |
| 10000000 | 176 | 1.6% |
| 30000000 | 164 | 1.5% |
| 5000000 | 141 | 1.3% |
| 40000000 | 134 | 1.2% |
| 35000000 | 128 | 1.2% |
| 12000000 | 120 | 1.1% |
| Other values (547) | 3755 |
| Value | Count | Frequency (%) |
| 1 | 4 | |
| 2 | 1 | < 0.1% |
| 3 | 3 | |
| 5 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 8 | 3 | |
| 10 | 6 | |
| 11 | 1 | < 0.1% |
| 12 | 2 | < 0.1% |
| 14 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 425000000 | 1 | < 0.1% |
| 380000000 | 1 | < 0.1% |
| 300000000 | 1 | < 0.1% |
| 280000000 | 1 | < 0.1% |
| 270000000 | 1 | < 0.1% |
| 260000000 | 2 | < 0.1% |
| 258000000 | 1 | < 0.1% |
| 255000000 | 1 | < 0.1% |
| 250000000 | 7 | |
| 245000000 | 1 | < 0.1% |
| Distinct | 4702 |
|---|---|
| Distinct (%) | 43.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 61879229.88 |
| Minimum | 2 |
|---|---|
| Maximum | 2781505847 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 85.0 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 1069466 |
| Q1 | 39826896.08 |
| median | 39826896.08 |
| Q3 | 39826896.08 |
| 95-th percentile | 213723484 |
| Maximum | 2781505847 |
| Range | 2781505845 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 111023555.9 |
|---|---|
| Coefficient of variation (CV) | 1.79419744 |
| Kurtosis | 84.97811557 |
| Mean | 61879229.88 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.215638107 |
| Sum | 6.723178327 × 1011 |
| Variance | 1.232622995 × 1016 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 39826896.08 | 6016 | |
| 12000000 | 10 | 0.1% |
| 10000000 | 8 | 0.1% |
| 11000000 | 7 | 0.1% |
| 5000000 | 6 | 0.1% |
| 2000000 | 6 | 0.1% |
| 6000000 | 6 | 0.1% |
| 20000000 | 5 | < 0.1% |
| 30000000 | 5 | < 0.1% |
| 13000000 | 5 | < 0.1% |
| Other values (4692) | 4791 |
| Value | Count | Frequency (%) |
| 2 | 2 | |
| 3 | 3 | |
| 5 | 2 | |
| 6 | 2 | |
| 9 | 2 | |
| 10 | 1 | < 0.1% |
| 11 | 3 | |
| 12 | 1 | < 0.1% |
| 13 | 2 | |
| 15 | 3 |
| Value | Count | Frequency (%) |
| 2781505847 | 1 | |
| 2068178225 | 1 | |
| 1845034188 | 1 | |
| 1519557910 | 1 | |
| 1513528810 | 1 | |
| 1506249360 | 1 | |
| 1405035767 | 1 | |
| 1327817822 | 1 | |
| 1274219009 | 1 | |
| 1215439994 | 1 |
| Distinct | 5068 |
|---|---|
| Distinct (%) | 46.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 85.0 KiB |
| Woody Allen | 45 |
|---|---|
| other | 44 |
| Clint Eastwood | 34 |
| Martin Scorsese | 29 |
| Steven Spielberg | 29 |
| Other values (5063) |
Length
| Max length | 533 |
|---|---|
| Median length | 169 |
| Mean length | 14.5192821 |
| Min length | 2 |
Characters and Unicode
| Total characters | 157752 |
|---|---|
| Distinct characters | 96 |
| Distinct categories | 18 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 5 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 3217 ? |
|---|---|
| Unique (%) | 29.6% |
Sample
| 1st row | Colin Trevorrow |
|---|---|
| 2nd row | George Miller |
| 3rd row | Robert Schwentke |
| 4th row | J.J. Abrams |
| 5th row | James Wan |
Common Values
| Value | Count | Frequency (%) |
| Woody Allen | 45 | 0.4% |
| other | 44 | 0.4% |
| Clint Eastwood | 34 | 0.3% |
| Martin Scorsese | 29 | 0.3% |
| Steven Spielberg | 29 | 0.3% |
| Ridley Scott | 23 | 0.2% |
| Ron Howard | 22 | 0.2% |
| Steven Soderbergh | 22 | 0.2% |
| Joel Schumacher | 21 | 0.2% |
| Brian De Palma | 20 | 0.2% |
| Other values (5058) | 10576 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| john | 436 | 1.8% |
| michael | 308 | 1.3% |
| david | 301 | 1.3% |
| robert | 212 | 0.9% |
| peter | 201 | 0.8% |
| james | 162 | 0.7% |
| richard | 159 | 0.7% |
| paul | 144 | 0.6% |
| mark | 110 | 0.5% |
| lee | 107 | 0.4% |
| Other values (6203) | 21641 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 14674 | 9.3% |
| 12932 | 8.2% | |
| a | 12669 | 8.0% |
| n | 11046 | 7.0% |
| r | 10732 | 6.8% |
| o | 9204 | 5.8% |
| i | 9106 | 5.8% |
| l | 7510 | 4.8% |
| t | 5516 | 3.5% |
| s | 5207 | 3.3% |
| Other values (86) | 59156 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 116745 | |
| Uppercase Letter | 25715 | 16.3% |
| Space Separator | 12934 | 8.2% |
| Math Symbol | 1088 | 0.7% |
| Other Punctuation | 838 | 0.5% |
| Dash Punctuation | 178 | 0.1% |
| Other Symbol | 118 | 0.1% |
| Format | 35 | < 0.1% |
| Other Number | 32 | < 0.1% |
| Modifier Symbol | 28 | < 0.1% |
| Other values (8) | 41 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 2357 | 9.2% |
| J | 2208 | 8.6% |
| M | 2207 | 8.6% |
| R | 1800 | 7.0% |
| B | 1686 | 6.6% |
| C | 1612 | 6.3% |
| D | 1483 | 5.8% |
| A | 1433 | 5.6% |
| G | 1264 | 4.9% |
| L | 1222 | 4.8% |
| Other values (20) | 8443 |
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 14674 | |
| a | 12669 | |
| n | 11046 | |
| r | 10732 | |
| o | 9204 | 7.9% |
| i | 9106 | 7.8% |
| l | 7510 | 6.4% |
| t | 5516 | 4.7% |
| s | 5207 | 4.5% |
| h | 4588 | 3.9% |
| Other values (16) | 26493 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 644 | |
| ¡ | 65 | 7.8% |
| ' | 52 | 6.2% |
| ¶ | 30 | 3.6% |
| , | 18 | 2.1% |
| § | 15 | 1.8% |
| ‡ | 5 | 0.6% |
| ‰ | 5 | 0.6% |
| … | 4 | 0.5% |
Modifier Symbol
| Value | Count | Frequency (%) |
| ¨ | 12 | |
| ´ | 9 | |
| ¸ | 5 | |
| ¯ | 2 | 7.1% |
Math Symbol
| Value | Count | Frequency (%) |
| | | 1070 | |
| ± | 17 | 1.6% |
| ¬ | 1 | 0.1% |
Other Symbol
| Value | Count | Frequency (%) |
| © | 112 | |
| ¦ | 5 | 4.2% |
| ™ | 1 | 0.8% |
Currency Symbol
| Value | Count | Frequency (%) |
| ¥ | 11 | |
| ¤ | 7 | |
| € | 1 | 5.3% |
Space Separator
| Value | Count | Frequency (%) |
| 12932 | ||
| 2 | < 0.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 177 | |
| – | 1 | 0.6% |
Other Number
| Value | Count | Frequency (%) |
| ³ | 27 | |
| ¼ | 5 | 15.6% |
Open Punctuation
| Value | Count | Frequency (%) |
| ‚ | 4 | |
| ( | 1 | 20.0% |
Final Punctuation
| Value | Count | Frequency (%) |
| » | 3 | |
| › | 1 | 25.0% |
Initial Punctuation
| Value | Count | Frequency (%) |
| « | 3 | |
| ‘ | 1 | 25.0% |
Other Letter
| Value | Count | Frequency (%) |
| º | 3 | |
| ª | 1 | 25.0% |
Format
| Value | Count | Frequency (%) |
| | 35 |
Control
| Value | Count | Frequency (%) |
| 3 |
Decimal Number
| Value | Count | Frequency (%) |
| 9 | 1 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 142464 | |
| Common | 15288 | 9.7% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 14674 | 10.3% |
| a | 12669 | 8.9% |
| n | 11046 | 7.8% |
| r | 10732 | 7.5% |
| o | 9204 | 6.5% |
| i | 9106 | 6.4% |
| l | 7510 | 5.3% |
| t | 5516 | 3.9% |
| s | 5207 | 3.7% |
| h | 4588 | 3.2% |
| Other values (48) | 52212 |
Common
| Value | Count | Frequency (%) |
| 12932 | ||
| | | 1070 | 7.0% |
| . | 644 | 4.2% |
| - | 177 | 1.2% |
| © | 112 | 0.7% |
| ¡ | 65 | 0.4% |
| ' | 52 | 0.3% |
| | 35 | 0.2% |
| ¶ | 30 | 0.2% |
| ³ | 27 | 0.2% |
| Other values (28) | 144 | 0.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 156950 | |
| None | 779 | 0.5% |
| Punctuation | 21 | < 0.1% |
| Letterlike Symbols | 1 | < 0.1% |
| Currency Symbols | 1 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 14674 | 9.3% |
| 12932 | 8.2% | |
| a | 12669 | 8.1% |
| n | 11046 | 7.0% |
| r | 10732 | 6.8% |
| o | 9204 | 5.9% |
| i | 9106 | 5.8% |
| l | 7510 | 4.8% |
| t | 5516 | 3.5% |
| s | 5207 | 3.3% |
| Other values (52) | 58354 |
None
| Value | Count | Frequency (%) |
| Ã | 377 | |
| © | 112 | 14.4% |
| ¡ | 65 | 8.3% |
| | 35 | 4.5% |
| ¶ | 30 | 3.9% |
| ³ | 27 | 3.5% |
| ± | 17 | 2.2% |
| Å | 15 | 1.9% |
| § | 15 | 1.9% |
| Ä | 14 | 1.8% |
| Other values (15) | 72 | 9.2% |
Punctuation
| Value | Count | Frequency (%) |
| ‡ | 5 | |
| ‰ | 5 | |
| … | 4 | |
| ‚ | 4 | |
| – | 1 | 4.8% |
| › | 1 | 4.8% |
| ‘ | 1 | 4.8% |
Letterlike Symbols
| Value | Count | Frequency (%) |
| ™ | 1 |
Currency Symbols
| Value | Count | Frequency (%) |
| € | 1 |
runtime
Real number (ℝ≥0)
| Distinct | 247 |
|---|---|
| Distinct (%) | 2.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 102.0717902 |
| Minimum | 0 |
|---|---|
| Maximum | 900 |
| Zeros | 31 |
| Zeros (%) | 0.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 85.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 75 |
| Q1 | 90 |
| median | 99 |
| Q3 | 111 |
| 95-th percentile | 139 |
| Maximum | 900 |
| Range | 900 |
| Interquartile range (IQR) | 21 |
Descriptive statistics
| Standard deviation | 31.38270058 |
|---|---|
| Coefficient of variation (CV) | 0.3074571391 |
| Kurtosis | 116.2281369 |
| Mean | 102.0717902 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 6.103513226 |
| Sum | 1109010 |
| Variance | 984.8738958 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 90 | 547 | 5.0% |
| 95 | 358 | 3.3% |
| 100 | 335 | 3.1% |
| 93 | 328 | 3.0% |
| 97 | 306 | 2.8% |
| 96 | 300 | 2.8% |
| 91 | 297 | 2.7% |
| 94 | 292 | 2.7% |
| 98 | 270 | 2.5% |
| 88 | 270 | 2.5% |
| Other values (237) | 7562 |
| Value | Count | Frequency (%) |
| 0 | 31 | |
| 2 | 5 | < 0.1% |
| 3 | 11 | 0.1% |
| 4 | 17 | |
| 5 | 17 | |
| 6 | 22 | |
| 7 | 17 | |
| 8 | 9 | 0.1% |
| 9 | 7 | 0.1% |
| 10 | 6 | 0.1% |
| Value | Count | Frequency (%) |
| 900 | 1 | |
| 877 | 1 | |
| 705 | 1 | |
| 566 | 1 | |
| 561 | 1 | |
| 550 | 1 | |
| 540 | 1 | |
| 501 | 1 | |
| 500 | 1 | |
| 470 | 1 |
| Distinct | 2040 |
|---|---|
| Distinct (%) | 18.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 85.0 KiB |
| Comedy | 712 |
|---|---|
| Drama | 712 |
| Documentary | 312 |
| Drama|Romance | 289 |
| Comedy|Drama | 280 |
| Other values (2035) |
Length
| Max length | 51 |
|---|---|
| Median length | 44 |
| Mean length | 18.50621261 |
| Min length | 3 |
Characters and Unicode
| Total characters | 201070 |
|---|---|
| Distinct characters | 30 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 1226 ? |
|---|---|
| Unique (%) | 11.3% |
Sample
| 1st row | Action|Adventure|Science Fiction|Thriller |
|---|---|
| 2nd row | Action|Adventure|Science Fiction|Thriller |
| 3rd row | Adventure|Science Fiction|Thriller |
| 4th row | Action|Adventure|Science Fiction|Fantasy |
| 5th row | Action|Crime|Thriller |
Common Values
| Value | Count | Frequency (%) |
| Comedy | 712 | 6.6% |
| Drama | 712 | 6.6% |
| Documentary | 312 | 2.9% |
| Drama|Romance | 289 | 2.7% |
| Comedy|Drama | 280 | 2.6% |
| Comedy|Romance | 268 | 2.5% |
| Horror|Thriller | 259 | 2.4% |
| Horror | 253 | 2.3% |
| Comedy|Drama|Romance | 222 | 2.0% |
| Drama|Thriller | 138 | 1.3% |
| Other values (2030) | 7420 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| comedy | 712 | 5.8% |
| drama | 712 | 5.8% |
| fiction | 669 | 5.5% |
| documentary | 312 | 2.5% |
| drama|romance | 289 | 2.4% |
| comedy|drama | 280 | 2.3% |
| comedy|romance | 268 | 2.2% |
| horror|thriller | 259 | 2.1% |
| horror | 253 | 2.1% |
| comedy|drama|romance | 222 | 1.8% |
| Other values (1901) | 8285 |
Most occurring characters
| Value | Count | Frequency (%) |
| r | 20620 | 10.3% |
| e | 17204 | 8.6% |
| | | 16113 | 8.0% |
| a | 15784 | 7.9% |
| o | 14323 | 7.1% |
| m | 14069 | 7.0% |
| i | 14058 | 7.0% |
| n | 11212 | 5.6% |
| c | 8711 | 4.3% |
| t | 8551 | 4.3% |
| Other values (20) | 60425 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 155043 | |
| Uppercase Letter | 28518 | 14.2% |
| Math Symbol | 16113 | 8.0% |
| Space Separator | 1396 | 0.7% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 20620 | |
| e | 17204 | |
| a | 15784 | |
| o | 14323 | |
| m | 14069 | |
| i | 14058 | |
| n | 11212 | |
| c | 8711 | 5.6% |
| t | 8551 | 5.5% |
| y | 8414 | 5.4% |
| Other values (7) | 22097 |
Uppercase Letter
| Value | Count | Frequency (%) |
| D | 5280 | |
| C | 5147 | |
| A | 4554 | |
| F | 3564 | |
| T | 3074 | |
| H | 1971 | 6.9% |
| R | 1712 | 6.0% |
| M | 1385 | 4.9% |
| S | 1229 | 4.3% |
| W | 435 | 1.5% |
Math Symbol
| Value | Count | Frequency (%) |
| | | 16113 |
Space Separator
| Value | Count | Frequency (%) |
| 1396 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 183561 | |
| Common | 17509 | 8.7% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| r | 20620 | |
| e | 17204 | 9.4% |
| a | 15784 | 8.6% |
| o | 14323 | 7.8% |
| m | 14069 | 7.7% |
| i | 14058 | 7.7% |
| n | 11212 | 6.1% |
| c | 8711 | 4.7% |
| t | 8551 | 4.7% |
| y | 8414 | 4.6% |
| Other values (18) | 50615 |
Common
| Value | Count | Frequency (%) |
| | | 16113 | |
| 1396 | 8.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 201070 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| r | 20620 | 10.3% |
| e | 17204 | 8.6% |
| | | 16113 | 8.0% |
| a | 15784 | 7.9% |
| o | 14323 | 7.1% |
| m | 14069 | 7.0% |
| i | 14058 | 7.0% |
| n | 11212 | 5.6% |
| c | 8711 | 4.3% |
| t | 8551 | 4.3% |
| Other values (20) | 60425 |
| Distinct | 5909 |
|---|---|
| Distinct (%) | 54.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 85.0 KiB |
| 1/1/09 | 28 |
|---|---|
| 1/1/08 | 21 |
| 1/1/07 | 18 |
| 1/1/05 | 16 |
| 10/10/14 | 15 |
| Other values (5904) |
Length
| Max length | 8 |
|---|---|
| Median length | 7 |
| Mean length | 6.960055223 |
| Min length | 6 |
Characters and Unicode
| Total characters | 75621 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 3386 ? |
|---|---|
| Unique (%) | 31.2% |
Sample
| 1st row | 6/9/15 |
|---|---|
| 2nd row | 5/13/15 |
| 3rd row | 3/18/15 |
| 4th row | 12/15/15 |
| 5th row | 4/1/15 |
Common Values
| Value | Count | Frequency (%) |
| 1/1/09 | 28 | 0.3% |
| 1/1/08 | 21 | 0.2% |
| 1/1/07 | 18 | 0.2% |
| 1/1/05 | 16 | 0.1% |
| 10/10/14 | 15 | 0.1% |
| 1/1/06 | 13 | 0.1% |
| 9/7/12 | 13 | 0.1% |
| 1/1/03 | 13 | 0.1% |
| 1/1/12 | 12 | 0.1% |
| 10/16/15 | 12 | 0.1% |
| Other values (5899) | 10704 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 1/1/09 | 28 | 0.3% |
| 1/1/08 | 21 | 0.2% |
| 1/1/07 | 18 | 0.2% |
| 1/1/05 | 16 | 0.1% |
| 10/10/14 | 15 | 0.1% |
| 1/1/06 | 13 | 0.1% |
| 9/7/12 | 13 | 0.1% |
| 1/1/03 | 13 | 0.1% |
| 10/16/15 | 12 | 0.1% |
| 10/14/11 | 12 | 0.1% |
| Other values (5899) | 10704 |
Most occurring characters
| Value | Count | Frequency (%) |
| / | 21730 | |
| 1 | 14799 | |
| 2 | 7071 | 9.4% |
| 0 | 6708 | 8.9% |
| 9 | 5065 | 6.7% |
| 8 | 3932 | 5.2% |
| 3 | 3527 | 4.7% |
| 5 | 3295 | 4.4% |
| 7 | 3259 | 4.3% |
| 4 | 3219 | 4.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 53891 | |
| Other Punctuation | 21730 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 14799 | |
| 2 | 7071 | |
| 0 | 6708 | |
| 9 | 5065 | 9.4% |
| 8 | 3932 | 7.3% |
| 3 | 3527 | 6.5% |
| 5 | 3295 | 6.1% |
| 7 | 3259 | 6.0% |
| 4 | 3219 | 6.0% |
| 6 | 3016 | 5.6% |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 21730 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 75621 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| / | 21730 | |
| 1 | 14799 | |
| 2 | 7071 | 9.4% |
| 0 | 6708 | 8.9% |
| 9 | 5065 | 6.7% |
| 8 | 3932 | 5.2% |
| 3 | 3527 | 4.7% |
| 5 | 3295 | 4.4% |
| 7 | 3259 | 4.3% |
| 4 | 3219 | 4.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 75621 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| / | 21730 | |
| 1 | 14799 | |
| 2 | 7071 | 9.4% |
| 0 | 6708 | 8.9% |
| 9 | 5065 | 6.7% |
| 8 | 3932 | 5.2% |
| 3 | 3527 | 4.7% |
| 5 | 3295 | 4.4% |
| 7 | 3259 | 4.3% |
| 4 | 3219 | 4.3% |
| Distinct | 1289 |
|---|---|
| Distinct (%) | 11.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 217.3996318 |
| Minimum | 10 |
|---|---|
| Maximum | 9767 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 85.0 KiB |
Quantile statistics
| Minimum | 10 |
|---|---|
| 5-th percentile | 11 |
| Q1 | 17 |
| median | 38 |
| Q3 | 146 |
| 95-th percentile | 1026 |
| Maximum | 9767 |
| Range | 9757 |
| Interquartile range (IQR) | 129 |
Descriptive statistics
| Standard deviation | 575.644627 |
|---|---|
| Coefficient of variation (CV) | 2.647863854 |
| Kurtosis | 53.3557307 |
| Mean | 217.3996318 |
| Median Absolute Deviation (MAD) | 26 |
| Skewness | 6.177000343 |
| Sum | 2362047 |
| Variance | 331366.7366 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 10 | 501 | 4.6% |
| 11 | 474 | 4.4% |
| 12 | 422 | 3.9% |
| 13 | 377 | 3.5% |
| 14 | 323 | 3.0% |
| 15 | 300 | 2.8% |
| 16 | 270 | 2.5% |
| 17 | 256 | 2.4% |
| 18 | 218 | 2.0% |
| 19 | 189 | 1.7% |
| Other values (1279) | 7535 |
| Value | Count | Frequency (%) |
| 10 | 501 | |
| 11 | 474 | |
| 12 | 422 | |
| 13 | 377 | |
| 14 | 323 | |
| 15 | 300 | |
| 16 | 270 | |
| 17 | 256 | |
| 18 | 218 | |
| 19 | 189 | 1.7% |
| Value | Count | Frequency (%) |
| 9767 | 1 | |
| 8903 | 1 | |
| 8458 | 1 | |
| 8432 | 1 | |
| 7375 | 1 | |
| 7080 | 1 | |
| 6882 | 1 | |
| 6723 | 1 | |
| 6498 | 1 | |
| 6417 | 1 |
vote_average
Real number (ℝ≥0)
| Distinct | 72 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.975011505 |
| Minimum | 1.5 |
|---|---|
| Maximum | 9.2 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 85.0 KiB |
Quantile statistics
| Minimum | 1.5 |
|---|---|
| 5-th percentile | 4.4 |
| Q1 | 5.4 |
| median | 6 |
| Q3 | 6.6 |
| 95-th percentile | 7.4 |
| Maximum | 9.2 |
| Range | 7.7 |
| Interquartile range (IQR) | 1.2 |
Descriptive statistics
| Standard deviation | 0.9351380715 |
|---|---|
| Coefficient of variation (CV) | 0.1565081625 |
| Kurtosis | 0.5439449721 |
| Mean | 5.975011505 |
| Median Absolute Deviation (MAD) | 0.6 |
| Skewness | -0.4361369375 |
| Sum | 64918.5 |
| Variance | 0.8744832128 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6.1 | 496 | 4.6% |
| 6 | 495 | 4.6% |
| 5.8 | 486 | 4.5% |
| 5.9 | 473 | 4.4% |
| 6.2 | 464 | 4.3% |
| 6.3 | 461 | 4.2% |
| 6.5 | 457 | 4.2% |
| 6.4 | 446 | 4.1% |
| 5.7 | 415 | 3.8% |
| 6.6 | 413 | 3.8% |
| Other values (62) | 6259 |
| Value | Count | Frequency (%) |
| 1.5 | 2 | < 0.1% |
| 2 | 1 | < 0.1% |
| 2.1 | 3 | |
| 2.2 | 3 | |
| 2.3 | 2 | < 0.1% |
| 2.4 | 7 | |
| 2.5 | 2 | < 0.1% |
| 2.6 | 3 | |
| 2.7 | 3 | |
| 2.8 | 7 |
| Value | Count | Frequency (%) |
| 9.2 | 1 | < 0.1% |
| 8.9 | 1 | < 0.1% |
| 8.8 | 2 | < 0.1% |
| 8.7 | 1 | < 0.1% |
| 8.6 | 1 | < 0.1% |
| 8.5 | 6 | 0.1% |
| 8.4 | 10 | |
| 8.3 | 10 | |
| 8.2 | 6 | 0.1% |
| 8.1 | 16 |
| Distinct | 56 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2001.321859 |
| Minimum | 1960 |
|---|---|
| Maximum | 2015 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 85.0 KiB |
Quantile statistics
| Minimum | 1960 |
|---|---|
| 5-th percentile | 1973 |
| Q1 | 1995 |
| median | 2006 |
| Q3 | 2011 |
| 95-th percentile | 2015 |
| Maximum | 2015 |
| Range | 55 |
| Interquartile range (IQR) | 16 |
Descriptive statistics
| Standard deviation | 12.81325978 |
|---|---|
| Coefficient of variation (CV) | 0.006402398355 |
| Kurtosis | 0.799702805 |
| Mean | 2001.321859 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | -1.204116723 |
| Sum | 21744362 |
| Variance | 164.1796261 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2014 | 700 | 6.4% |
| 2013 | 659 | 6.1% |
| 2015 | 629 | 5.8% |
| 2012 | 588 | 5.4% |
| 2011 | 540 | 5.0% |
| 2009 | 533 | 4.9% |
| 2008 | 496 | 4.6% |
| 2010 | 489 | 4.5% |
| 2007 | 438 | 4.0% |
| 2006 | 408 | 3.8% |
| Other values (46) | 5385 |
| Value | Count | Frequency (%) |
| 1960 | 32 | |
| 1961 | 31 | |
| 1962 | 32 | |
| 1963 | 34 | |
| 1964 | 42 | |
| 1965 | 35 | |
| 1966 | 46 | |
| 1967 | 40 | |
| 1968 | 39 | |
| 1969 | 31 |
| Value | Count | Frequency (%) |
| 2015 | 629 | |
| 2014 | 700 | |
| 2013 | 659 | |
| 2012 | 588 | |
| 2011 | 540 | |
| 2010 | 489 | |
| 2009 | 533 | |
| 2008 | 496 | |
| 2007 | 438 | |
| 2006 | 408 |
| Distinct | 2614 |
|---|---|
| Distinct (%) | 24.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 26750464.35 |
| Minimum | 0.9210910508 |
|---|---|
| Maximum | 425000000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 85.0 KiB |
Quantile statistics
| Minimum | 0.9210910508 |
|---|---|
| 5-th percentile | 2346207.763 |
| Q1 | 17549894.04 |
| median | 17549894.04 |
| Q3 | 20853251.08 |
| 95-th percentile | 89378476.25 |
| Maximum | 425000000 |
| Range | 424999999.1 |
| Interquartile range (IQR) | 3303357.047 |
Descriptive statistics
| Standard deviation | 30510067.24 |
|---|---|
| Coefficient of variation (CV) | 1.140543463 |
| Kurtosis | 17.6875005 |
| Mean | 26750464.35 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.596341207 |
| Sum | 2.906437952 × 1011 |
| Variance | 9.308642027 × 1014 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 17549894.04 | 5696 | |
| 21033371.65 | 17 | 0.2% |
| 10164004.34 | 17 | 0.2% |
| 20000000 | 16 | 0.1% |
| 4605455.254 | 15 | 0.1% |
| 33496898.69 | 14 | 0.1% |
| 24234951.06 | 14 | 0.1% |
| 26291714.57 | 13 | 0.1% |
| 20328008.68 | 13 | 0.1% |
| 40656017.36 | 13 | 0.1% |
| Other values (2604) | 5037 |
| Value | Count | Frequency (%) |
| 0.9210910508 | 1 | |
| 0.9693980426 | 1 | |
| 1.012786634 | 1 | |
| 1.309052847 | 1 | |
| 2.908194128 | 1 | |
| 3 | 1 | |
| 4.519284805 | 1 | |
| 4.605455254 | 1 | |
| 5.006695621 | 1 | |
| 8.10229307 | 1 |
| Value | Count | Frequency (%) |
| 425000000 | 1 | |
| 368371256.2 | 1 | |
| 315500574.8 | 1 | |
| 292050672.7 | 1 | |
| 271692064.2 | 1 | |
| 271330494.3 | 1 | |
| 260000000 | 1 | |
| 257599886.7 | 1 | |
| 254100108.5 | 1 | |
| 250419201.7 | 1 |
| Distinct | 4840 |
|---|---|
| Distinct (%) | 44.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 79812252.07 |
| Minimum | 2.37070529 |
|---|---|
| Maximum | 2827123750 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 85.0 KiB |
Quantile statistics
| Minimum | 2.37070529 |
|---|---|
| 5-th percentile | 1140131.371 |
| Q1 | 51369001.76 |
| median | 51369001.76 |
| Q3 | 51369001.76 |
| 95-th percentile | 276582848.1 |
| Maximum | 2827123750 |
| Range | 2827123748 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 136564704.8 |
|---|---|
| Coefficient of variation (CV) | 1.711074443 |
| Kurtosis | 74.55950245 |
| Mean | 79812252.07 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 6.821910006 |
| Sum | 8.671601187 × 1011 |
| Variance | 1.864991859 × 1016 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 51369001.76 | 6016 | |
| 29106404.28 | 2 | < 0.1% |
| 26331569.65 | 2 | < 0.1% |
| 117753430.8 | 2 | < 0.1% |
| 31721459 | 2 | < 0.1% |
| 1000000 | 2 | < 0.1% |
| 89906740.12 | 2 | < 0.1% |
| 14389144.83 | 2 | < 0.1% |
| 57667591.03 | 2 | < 0.1% |
| 209354710.5 | 2 | < 0.1% |
| Other values (4830) | 4831 |
| Value | Count | Frequency (%) |
| 2.37070529 | 1 | |
| 2.861933734 | 1 | |
| 3.038359901 | 1 | |
| 5.926763224 | 1 | |
| 6.951083695 | 1 | |
| 8.585801203 | 1 | |
| 9.05681977 | 1 | |
| 9.115079704 | 1 | |
| 10 | 1 | |
| 10.29636688 | 1 |
| Value | Count | Frequency (%) |
| 2827123750 | 1 | |
| 2789712242 | 1 | |
| 2506405735 | 1 | |
| 2167324901 | 1 | |
| 1907005842 | 1 | |
| 1902723130 | 1 | |
| 1791694309 | 1 | |
| 1583049536 | 1 | |
| 1574814740 | 1 | |
| 1443191435 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | original_title | popularity | budget | revenue | director | runtime | genres | release_date | vote_count | vote_average | release_year | budget_adj | revenue_adj | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Jurassic World | 32.985763 | 150000000.0 | 1.513529e+09 | Colin Trevorrow | 124 | Action|Adventure|Science Fiction|Thriller | 6/9/15 | 5562 | 6.5 | 2015 | 1.379999e+08 | 1.392446e+09 |
| 1 | 1 | Mad Max: Fury Road | 28.419936 | 150000000.0 | 3.784364e+08 | George Miller | 120 | Action|Adventure|Science Fiction|Thriller | 5/13/15 | 6185 | 7.1 | 2015 | 1.379999e+08 | 3.481613e+08 |
| 2 | 2 | Insurgent | 13.112507 | 110000000.0 | 2.952382e+08 | Robert Schwentke | 119 | Adventure|Science Fiction|Thriller | 3/18/15 | 2480 | 6.3 | 2015 | 1.012000e+08 | 2.716190e+08 |
| 3 | 3 | Star Wars: The Force Awakens | 11.173104 | 200000000.0 | 2.068178e+09 | J.J. Abrams | 136 | Action|Adventure|Science Fiction|Fantasy | 12/15/15 | 5292 | 7.5 | 2015 | 1.839999e+08 | 1.902723e+09 |
| 4 | 4 | Furious 7 | 9.335014 | 190000000.0 | 1.506249e+09 | James Wan | 137 | Action|Crime|Thriller | 4/1/15 | 2947 | 7.3 | 2015 | 1.747999e+08 | 1.385749e+09 |
| 5 | 5 | The Revenant | 9.110700 | 135000000.0 | 5.329505e+08 | Alejandro González Iñárritu | 156 | Western|Drama|Adventure|Thriller | 12/25/15 | 3929 | 7.2 | 2015 | 1.241999e+08 | 4.903142e+08 |
| 6 | 6 | Terminator Genisys | 8.654359 | 155000000.0 | 4.406035e+08 | Alan Taylor | 125 | Science Fiction|Action|Thriller|Adventure | 6/23/15 | 2598 | 5.8 | 2015 | 1.425999e+08 | 4.053551e+08 |
| 7 | 7 | The Martian | 7.667400 | 108000000.0 | 5.953803e+08 | Ridley Scott | 141 | Drama|Adventure|Science Fiction | 9/30/15 | 4572 | 7.6 | 2015 | 9.935996e+07 | 5.477497e+08 |
| 8 | 8 | Minions | 7.404165 | 74000000.0 | 1.156731e+09 | Kyle Balda|Pierre Coffin | 91 | Family|Animation|Adventure|Comedy | 6/17/15 | 2893 | 6.5 | 2015 | 6.807997e+07 | 1.064192e+09 |
| 9 | 9 | Inside Out | 6.326804 | 175000000.0 | 8.537086e+08 | Pete Docter | 94 | Comedy|Animation|Family | 6/9/15 | 3935 | 8.0 | 2015 | 1.609999e+08 | 7.854116e+08 |
Last rows
| df_index | original_title | popularity | budget | revenue | director | runtime | genres | release_date | vote_count | vote_average | release_year | budget_adj | revenue_adj | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10855 | 10856 | The Ugly Dachshund | 0.140934 | 1.462429e+07 | 3.982690e+07 | Norman Tokar | 93 | Comedy|Drama|Family | 2/16/66 | 14 | 5.7 | 1966 | 1.754989e+07 | 5.136900e+07 |
| 10856 | 10857 | Nevada Smith | 0.131378 | 1.462429e+07 | 3.982690e+07 | Henry Hathaway | 128 | Action|Western | 6/10/66 | 10 | 5.9 | 1966 | 1.754989e+07 | 5.136900e+07 |
| 10857 | 10858 | The Russians Are Coming, The Russians Are Coming | 0.317824 | 1.462429e+07 | 3.982690e+07 | Norman Jewison | 126 | Comedy|War | 5/25/66 | 11 | 5.5 | 1966 | 1.754989e+07 | 5.136900e+07 |
| 10858 | 10859 | Seconds | 0.089072 | 1.462429e+07 | 3.982690e+07 | John Frankenheimer | 100 | Mystery|Science Fiction|Thriller|Drama | 10/5/66 | 22 | 6.6 | 1966 | 1.754989e+07 | 5.136900e+07 |
| 10859 | 10860 | Carry On Screaming! | 0.087034 | 1.462429e+07 | 3.982690e+07 | Gerald Thomas | 87 | Comedy | 5/20/66 | 13 | 7.0 | 1966 | 1.754989e+07 | 5.136900e+07 |
| 10860 | 10861 | The Endless Summer | 0.080598 | 1.462429e+07 | 3.982690e+07 | Bruce Brown | 95 | Documentary | 6/15/66 | 11 | 7.4 | 1966 | 1.754989e+07 | 5.136900e+07 |
| 10861 | 10862 | Grand Prix | 0.065543 | 1.462429e+07 | 3.982690e+07 | John Frankenheimer | 176 | Action|Adventure|Drama | 12/21/66 | 20 | 5.7 | 1966 | 1.754989e+07 | 5.136900e+07 |
| 10862 | 10863 | Beregis Avtomobilya | 0.065141 | 1.462429e+07 | 3.982690e+07 | Eldar Ryazanov | 94 | Mystery|Comedy | 1/1/66 | 11 | 6.5 | 1966 | 1.754989e+07 | 5.136900e+07 |
| 10863 | 10864 | What's Up, Tiger Lily? | 0.064317 | 1.462429e+07 | 3.982690e+07 | Woody Allen | 80 | Action|Comedy | 11/2/66 | 22 | 5.4 | 1966 | 1.754989e+07 | 5.136900e+07 |
| 10864 | 10865 | Manos: The Hands of Fate | 0.035919 | 1.900000e+04 | 3.982690e+07 | Harold P. Warren | 74 | Horror | 11/15/66 | 15 | 1.5 | 1966 | 1.276423e+05 | 5.136900e+07 |